DSA 554 3.0 Spatio-temporal Data Analysis

Dr. Thiyanga S. Talagala
Department of Statistics, Faculty of Applied Sciences
University of Sri Jayewardenepura, Sri Lanka

Time Series

A time series is a sequence of observations taken sequentially in time.

Cross sectional data

Observations that come from different individuals or groups at a single point in time.

Time series data

A set of observations, along with some information about what times those observations were recorded.

Time Series Patterns

Trend

Long-term increase or decrease in the data.

Seasonal

  • A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week).

  • Seasonality is always of a fixed and known period.

Cyclic

  • A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.

  • The duration of these fluctuations is usually of at least 2 years.

  • The average length of cycles is longer than the length of a seasonal pattern.

Frequency of a time series: Seasonal periods

Frequency: Number of observations per natural time interval (Usually year, but sometimes a week, a day, an hour)

Your turn:

Identify frequencies for the following:

  • Annual

  • Quarterly

  • Monthly

  • Weekly

Multiple Seasonal Patterns

For example, the hourly utility demand data exhibit both daily and weekly cycles.

Source: https://www.robjhyndman.com/papers/multiseasonal.pdf

Your turn

Identify multiple frequencies for the followings:

  • Daily

  • Hourly

  • Half-hourly

  • Minutes

  • Seconds

Correlation vs Autocorrelation

Correlation

  • Measures the strength of the linear relationship between two variables

\[r = \frac{\sum_{i=1}^{n} (x_i -\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i -\bar{x})^2 \sum_{i=1}^{n} (y_i -\bar{y})^2}}\]

Autocorrelation

Measures the strength of linear relationship between lagged values of time series.

\[r_k = \frac{\sum (y_t -\bar{y})(y_{t-k}-\bar{y})}{\sum (y_t -\bar{y})^2}\]

Lagged values: in-class

Autocorrelation plot

  1. Time series with trend only

  2. Time series with seasonality only

  3. Time series with trend and seasonal

Example 1

Example 1 (cont.)

Seasonal plots

Example 1 (cont.)

Example 2

Example 2(cont.)

Seasonal plots

Example 2 (cont.)

Example 3

Example 3 (cont)

Exercise

Question 6 at https://otexts.com/fpp2/graphics-exercises.html

Notation

\[\hat{Y}_{T+h|T}\]

Simple time series forecasting techniques

  1. Average method

  2. Naive method/ random walk method

  3. Seasonal naive method

  4. Drift method

Reading

Electricity Demand

       Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
1980  7366  7414  7824  7524  8279  8707  9486  8973  8231  8206  7927  7999
1981  7834  7521  8284  7999  8940  9381 10078  9796  8471  8572  8150  8168
1982  8166  7903  8606  8071  9178  9873 10476  9296  8818  8697  8381  8293
1983  7942  8001  8744  8397  9115  9773 10358  9849  9083  9143  8800  8741
1984  8492  8795  9354  8796 10072 10174 11326 10744  9806  9740  9373  9244
1985  9407  8827  9880  9364 10580 10899 11687 11280 10208 10212  9725  9721
1986  9846  9407 10265  9970 10801 11246 12167 11578 10645 10613 10104 10348
1987 10263  9973 10803 10409 11458 11845 12559 12070 11221 11338 10761 11012
1988 10923 10790 11427 10788 11772 12104 12634 12772 11764 11956 11646 11750
1989 11485 11198 12265 11704 12419 13259 13945 13839 12387 12546 12038 11977
1990 12336 11793 12877 11923 13306 13988 14002 14336 12867 12721 12449 12686
1991 12810 12015 12888 12431 13499 13014 14296 14125 12817 12862 12449 12489
1992 12621 12380 13023 12302 13339 13825 14428 14151 13355 13094 12656 12435
1993 13287 12434 13209 12817 13746 14259 14590 14354 13254 13464 13302 13456
1994 13171 12517 13489 12509 13785 13921 14603 14749 13540 13457 13243 13590
1995 13487 12776 13812 13032 14268 14473 15359 14457                        

Statistical Properties

Mean function

Let \({X_1, X_2, ...}\) be a sequence of time index random variables.

The mean function of \({X_t}\) is

\[\mu_X(t)=E(X_t).\]

Covariance function

The covariance function of \({X_t}\) is

\[\gamma_X(r, s)=Cov(X_r, X_s)=E[(X_r-\mu_X(r))(X_s-\mu_X(s))]\]

for all integers \((r)\) and \((s)\).

The covariance function of \({X_t}\) at lag \((h)\) is defined by \[\gamma_X(h):=\gamma_X(h, 0)=\gamma(t+h, t)=Cov(X_{t+h}, X_t).\]

Autocovariance function

The auto covariance function of \({X_t}\) at lag \((h)\) is

\[\gamma_X(h)=Cov(X_{t+h}, X_t).\] Autocorrelation function

The autocorrelation function of \({X_t}\) at lag \((h)\) is

\[\rho_X(h)=\frac{\gamma_X(h)}{\gamma_X(0)}=Cor(X_{t+h}, X_t).\]

Weekly stationary

A time series \({X_t}\) is called weekly stationary if

  • \(\mu_X(t)\) is independent of \(t\).

  • \(\gamma_X(t+h, t)\) is independent of \((t)\) for each \((h)\).

In other words the statistical properties of the time series (mean, variance, autocorrelation, etc.) do not depend on the time at which the series is observed, that is no trend or seasonality. However, a time series with cyclic behaviour (but with no trend or seasonality) is stationary.

Strict stationarity of a time series

A time series \(\{X_t\}\) is called weekly stationary if the random vector \([X_1, X_2..., X_n]\) and \([X_{1+h}, X_{2+h}..., X_{n+h}]\) have the same joint distribution for all integers \((h)\) and \((n > 0)\).

Simple time series models

1. iid noise

  1. no trend or seasonal component

  2. observations are independent and identically distributed (iid) random variables with zero mean.

  3. Notation: \({X_t} \sim IID(0, \sigma^2)\)

  4. plays an important role as a building block for more complicated time series.

import numpy
import matplotlib.pyplot as plt

mean = 0
std = 1 
num_samples = 1000
samples = numpy.random.normal(mean, std, size=num_samples)
samples

2. White noise

If \({X_t}\) is a sequence of uncorrelated random variables, each with zero mean and variance \(\sigma^2\), then such a sequence is referred to as white noise.

Note: Every \((IID(0, \sigma^2)\) sequence is \((WN(0, \sigma^2)\) but not conversely.

import numpy
import matplotlib.pyplot as plt

mean = 0
std = 1 
num_samples = 100
samples = numpy.random.normal(mean, std, size=num_samples)
plt.plot(samples)
plt.show()

ACF

import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(samples, lags=20)
plt.show()

White noise implies stationarity. Stationarity does not imply white noise.

3. Random walk

A random walk process is obtained by cumulatively summing iid random variables. If \({S_t, t=0, 1, 2, ...}\) is a random walk process, then \(S_0 =0\)

\(S_1=0+X_1\)

\(S_2=0+X_1+X_2\)

\(...\)

\(S_t=X_1+X_2+...+X_t.\)

Question

Is \({S_t, t=0, 1, 2, ...}\) a weak stationary process?

Identifying non-stationarity in the mean

  • Using time series plot

  • ACF plot

    • ACF of stationary time series will drop to relatively quickly.

    • The ACF of non-stationary series decreases slowly.

    • For non-stationary series, the ACF at lag 1 is often large and positive.

Elimination of Trend and Seasonality by Differencing

  • Differencing helps to stabilize the mean.

Backshift notation:

\[BX_t=X_{t-1}\]

Ordinary differencing

The first-order differencing can be defined as

\[\nabla X_t = X_t-X_{t-1}=X_t-BX_t=(1-B)X_t\] where \(\nabla=1-B\).

The second-order differencing

\[\nabla^2X_t=\nabla(\nabla X_t)=\nabla(X_t-X_{t-1})=\nabla X_t - \nabla X_{t-1}\]

\[\nabla X_t - \nabla X_{t-1}=(X_t-X_{t-1})-(X_{t-1}-X_{t-2})\] - In practice, we seldom need to go beyond second order differencing.

Seasonal differencing

  • differencing between an observation and the corresponding observation from the previous year.

\[\nabla_mX_t=X_t-X_{t-m}=(1-B^m)X_t\] where \((m)\) is the number of seasons. For monthly, \((m=12)\), for quarterly \((m=4)\).

For monthly series

\[\nabla_{12}X_t=X_t-X_{t-12}\]

Twice-differenced series

\[\nabla^2_{12}X_t=\nabla_{12}X_t-\nabla_{12}X_{t-1}\] \[\nabla_{12}X_t-\nabla_{12}X_{t-1}=(X_t-X_{t-12})-(X_{t-1}-X_{t-13})\] If seasonality is strong, the seasonal differencing should be done first.

Non-Stationary Time Series

1. Deterministic trend

\[Y_t = f(t) + \epsilon_t\]

where \(\epsilon_t \sim iid(0, \sigma^2)\), \(t = 1, 2, ...T\)

Mean of the process is time dependent, but the variance of the process is constant.

A trend is deterministic if it is a nonrandom function of time.

Non-Stationary Time Series (cont.)

2. Random walk

\[Y_t = Y_{t-1} + \epsilon_t\]

  • Random walk has a stochastic trend.

  • Model behind naive method.

A trend is said to be stochastic if it is a random function of time.

Non-Stationary Time Series (cont.)

3. Random walk with drift

\[Y_t = \alpha+ Y_{t-1} + \epsilon_t\]

  • Random walk with drift has a stochastic trend and a deterministic trend.

  • Model behind drift method.

Random walk

\[ \begin{aligned} Y_t &= Y_{t-1} + \epsilon_t \\ Y_1 &= Y_0 + \epsilon_1 \\ Y_2 &= Y_1 + \epsilon_2=Y_0 + \epsilon_1 + \epsilon_2\\ Y_3 &= Y_2 + \epsilon_3=Y_0 + \epsilon_1 + \epsilon_2 +\epsilon_3\\ . \\ Y_t &=Y_{t-1} + \epsilon_t=Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3 +...+ \epsilon_t = Y_0 + \sum_{i=1}^{t} \epsilon_t \end{aligned} \]

Mean: \(E(Y_t) = Y_0\).

Variance: \(Var(Y_t)=t \sigma^2\).

Random walk with drift

\[ \begin{aligned} Y_t &= Y_{t-1} + \epsilon_t \\ Y_1 &= \alpha+Y_0 + \epsilon_1 \\ Y_2 &= \alpha+ Y_1 + \epsilon_2=2 \alpha+Y_0 + \epsilon_1 + \epsilon_2\\ Y_3 &= \alpha+ Y_2 + \epsilon_3= 3 \alpha+ Y_0 + \epsilon_1 + \epsilon_2 +\epsilon_3\\ . \\ Y_t &= \alpha+Y_{t-1} + \epsilon_t= t \alpha+ Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3 +...+ \epsilon_t \\ Y_t &= t \alpha + Y_0 + \sum_{i=1}^{t} \epsilon_t \end{aligned} \]

Random walk with drift (cont.)

It has a deterministic trend \((Y_0 + t \alpha)\) and a stochastic trend \(\sum_{i=1}^{t} \epsilon_t\).

Mean: \(E(Y_t) = Y_0 + t\alpha\)

Variance: \(Var(Y_t) = t\sigma^2\).

There is a trend in both mean and variance.

Notation: I(d)

Integrated to order \(d\): Series can be made stationary by differencing \(d\) times.

  • Known as \(I(d)\) process.

Question: Show that random walk process is an \(I(1)\) process.

The random walk process is called a unit root process. (If one of the roots turns out to be one, then the process is called unit root process.)

Random walk

import numpy as np
rw = np.cumsum(samples)
plt.plot(rw)
plt.show()

Random walk - ACF

plot_acf(rw, lags=20)
plt.show()

Difference series

df = pd.DataFrame(rw, columns = ['Values'])
df['Lag 1'] = df['Values'].diff()
df['Lag 2'] = df['Values'].diff().diff()
df

Plot Lag 1 series

plt.plot(df['Values'].diff())
plt.show()

ACF Lag 1 series

diff = df['Lag 1']
plot_acf(diff.dropna(), lags=20)
plt.show()

Example 2

import numpy as np, pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize':(9,7), 'figure.dpi':120})

# Import data
df = pd.read_csv('wwwusage.csv', names=['value'], header=0)

# Original Series
fig, axes = plt.subplots(2, 2, sharex=True)
axes[0, 0].plot(df.value); axes[0, 0].set_title('Original Series')
plot_acf(df.value, ax=axes[0, 1], lags=np.arange(len(df)))

# 1st Differencing
axes[1, 0].plot(df.value.diff()); axes[1, 0].set_title('1st Order Differencing')
plot_acf(df.value.diff().dropna(), ax=axes[1, 1], lags=np.arange(len(df) - 1))
plt.show()

import numpy as np, pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize':(9,7), 'figure.dpi':120})

# Import data
df = pd.read_csv('wwwusage.csv', names=['value'], header=0)

# Original Series
fig, axes = plt.subplots(2, 2, sharex=True)
axes[0, 0].plot(df.value); axes[0, 0].set_title('Original Series')
plot_acf(df.value, ax=axes[0, 1], lags=np.arange(len(df)))

# 1st Differencing
axes[1, 0].plot(df.value.diff()); axes[1, 0].set_title('1st Order Differencing')
plot_acf(df.value.diff().dropna(), ax=axes[1, 1], lags=np.arange(len(df) - 1))
plt.show()

2nd order differencing

plot_acf(df.value.diff().diff().dropna())
plt.show()

Variance stabilization

Eg:

  • Square root: \(W_t = \sqrt{Y_t}\)

  • Logarithm: \(W_t = log({Y_t})\)

    • This very useful.

    • Interpretable: Changes in a log value are relative (percent) changes on the original sclae.

Monthly Airline Passenger Numbers 1949-1960

airpassenger = pd.read_csv('AirPassengers.csv')
from datetime import datetime
import plotnine
from plotnine import *
airpassenger['Month']= pd.to_datetime(airpassenger['Month'])
ggplot(airpassenger, aes(x='Month', y='#Passengers'))+geom_line()

Monthly Airline Passenger Numbers 1949-1960 - log

import numpy as np
airpassenger['naturallog'] = np.log(airpassenger['#Passengers']) 
ggplot(airpassenger, aes(x='Month', y='naturallog'))+geom_line()

Box-Cox transformation

\[ w_t=\begin{cases} log(y_t), & \text{if $\lambda=0$} \newline (Y_t^\lambda - 1)/ \lambda, & \text{otherwise}. \end{cases} \]

Different values of \(\lambda\) gives you different transformations.

  • \(\lambda=1\): No substantive transformation

  • \(\lambda = \frac{1}{2}\): Square root plus linear transformation

  • \(\lambda=0\): Natural logarithm

  • \(\lambda = -1\): Inverse plus 1

Balance the seasonal fluctuations and random variation across the series.

Box-Cox transformation

# import modules
import numpy as np
from scipy import stats
 
y2,fitted_lambda = stats.boxcox(airpassenger['#Passengers'])

Box-Cox transformation: Exploring the output

fitted_lambda
y2

Linear Time Series Analysis

Autoregressive Models

\(Y_t = \alpha + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \dots + \phi_p Y_{t-p} + \epsilon_t\)

Where:

\(Y_t\) is the value at time \(t\)

\(\alpha\) is a constant,

\(\phi_1, \phi_2,...\phi_p\) are the parameters,

\(\epsilon_t\) is white noise (error term),

\(p\) is the order of the AR model.

In-class: Properties of AR(1) process

Derive

  • Mean

  • Variance

  • Covariance

  • Autocorrelation function of an AR(1) process

In-class: Properties of AR(2) process

Derive

  • Mean

  • Variance

  • Covariance

  • Autocorrelation function of an AR(1) process

In-class: Properties of AR(P) process

Derive

  • Mean

  • Variance

  • Covariance

  • Autocorrelation function of an AR(P) process